Predicting protein secondary structure is a fundamental problem in proteinstructure prediction. Here we present a new supervised generative stochasticnetwork (GSN) based method to predict local secondary structure with deephierarchical representations. GSN is a recently proposed deep learningtechnique (Bengio & Thibodeau-Laufer, 2013) to globally train deep generativemodel. We present the supervised extension of GSN, which learns a Markov chainto sample from a conditional distribution, and applied it to protein structureprediction. To scale the model to full-sized, high-dimensional data, likeprotein sequences with hundreds of amino acids, we introduce a convolutionalarchitecture, which allows efficient learning across multiple layers ofhierarchical representations. Our architecture uniquely focuses on predictingstructured low-level labels informed with both low and high-levelrepresentations learned by the model. In our application this corresponds tolabeling the secondary structure state of each amino-acid residue. We trainedand tested the model on separate sets of non-homologous proteins sharing lessthan 30% sequence identity. Our model achieves 66.4% Q8 accuracy on the CB513dataset, better than the previously reported best performance 64.9% (Wang etal., 2011) for this challenging secondary structure prediction problem.
展开▼